6  Appendix C

6.1 Setup

6.1.1 Install Packages

We install the following packages using the groundhog package manager to increase computational reproducibility.

options(repos = c(CRAN = "https://cran.r-project.org")) 

if (!requireNamespace("groundhog", quietly = TRUE)) {
    install.packages("groundhog")
}

pkgs <- c("magrittr", "data.table", "stringr", "Rmisc", "ggplot2", "lmtest", "sandwich", "glue", "knitr")

groundhog::groundhog.library(pkg = pkgs,
                             date = "2024-08-01")

rm(pkgs)

6.1.2 Read Data

data      <- readRDS(file="../data/processed/full.Rda")
timeSpent <- data.table::fread(file = "../data/raw/PageTimes-2021-09-15.csv")
raw       <- data.table::fread(file="../data/raw/all_apps_wide_2021-09-15.csv")

6.1.3 Design

We define some design features in the following:

colors <- c("#F3B05C", "#1E4A75", "#65B5C0", "#AD5E21")

layout <- theme(panel.background = element_rect(fill = "white"),
                legend.key = element_rect(fill = "white"),
                panel.grid.major.y = element_line(colour = "grey", 
                                                  linewidth = 0.25),
                axis.ticks.y = element_blank(),
                panel.grid.major.x = element_blank(),
                axis.line.x.bottom = element_line(colour = "#000000", 
                                                  linewidth = 0.5),
                axis.line.y.left = element_blank(),
                plot.title = element_text(size = rel(1))
)

We examine the heterogeneous effects by estimating a triple interaction effect regression:

\[ \begin{aligned} y_{it} = &\ \alpha_{\text{baseline}} + \sum_{\text{treat}} \alpha_{\text{treat}} \text{treat}_i + \beta_{\text{baseline}} \text{part2}_t + \sum_{\text{treat}} \beta_{\text{treat}} \text{treat}_i \times \text{part2}_t \\ &\ + \theta_{\text{baseline}} (D_i \times \text{part2}_t) + \sum_{\text{treat}} \theta_{\text{treat}} (D_i \times \text{treat}_i \times \text{part2}_t) \\ &\ + \delta_{\text{baseline}} D_i + \sum_{\text{treat}} \delta_{\text{treat}} (D_i \times \text{treat}_i) + \gamma X_i + \epsilon_{it} \end{aligned} \]

The figure C.1a displays the estimators \(\beta_{treat}\) and \(\theta_{treat}\). The triple interaction effects \(\theta_{treat}\) are labelled with education DDD, temperature DDD, forecast usage DDD, credible DDD, accurate DDD, and female DDD in this case. The double interaction effects \(\beta_{treat}\) are labelled with lower education, lower temperature, less forecast usage, less credible, less accurate, and not female in this case. For example, to determine the total treatment effect for female on \(b\) , one must add the estimators \(\beta_{treat}\) (\(b\) (not female)) and \(\delta_{treat}\) (\(b\) (female DDD)). The same principle can be applied to the other two figures.

In figure C.1a, we pool across the different information treatments (interval, best guess, both) and only consider the heterogeneous effect of contradiction vs. confirmation. treat refers to contradiction while confirmation is used as the baseline.

In figure C.1b and C.1c, we test the heterogeneous effects of the different information treatments (interval, best guess, both) separately for confirmation and contradiction treatment arms. The best guess information treatment serves as the baseline and treat refers to the information treatments interval or both.

6.2 Works:

data[, `:=`(
  female_con  = as.integer(female == 1 & treated == 1 & surprise == 1),
  female_both = as.integer(female == 1 & treated == 1 & communication == "both"),
  female_int  = as.integer(female == 1 & treated == 1 & communication == "interval"),
  
  high_accuracy_con  = as.integer(high_accuracy == 1 & treated == 1 & surprise == 1),
  high_accuracy_both = as.integer(high_accuracy == 1 & treated == 1 & communication == "both"),
  high_accuracy_int  = as.integer(high_accuracy == 1 & treated == 1 & communication == "interval"),
  
  high_credibility_con  = as.integer(high_credibility == 1 & treated == 1 & surprise == 1),
  high_credibility_both = as.integer(high_credibility == 1 & treated == 1 & communication == "both"),
  high_credibility_int  = as.integer(high_credibility == 1 & treated == 1 & communication == "interval"),
  
  high_usage_con  = as.integer(high_usage == 1 & treated == 1 & surprise == 1),
  high_usage_both = as.integer(high_usage == 1 & treated == 1 & communication == "both"),
  high_usage_int  = as.integer(high_usage == 1 & treated == 1 & communication == "interval"),
  
  high_temperature_con  = as.integer(high_temperature == 1 & treated == 1 & surprise == 1),
  high_temperature_both = as.integer(high_temperature == 1 & treated == 1 & communication == "both"),
  high_temperature_int  = as.integer(high_temperature == 1 & treated == 1 & communication == "interval"),
  
  high_education_con  = as.integer(high_education == 1 & treated == 1 & surprise == 1),
  high_education_both = as.integer(high_education == 1 & treated == 1 & communication == "both"),
  high_education_int  = as.integer(high_education == 1 & treated == 1 & communication == "interval")
)]
data[, treatment := paste(ifelse(test = surprise, yes = "contr", no = "conf"),
                          communication, sep = " ")]

a = AA2 & b = AA1

covariates_dem <- c("age_35_52",
            "age_53_plus",
            "female", 
            "high_education",
            "high_income",
            "married",
            "parentship")

covariates_all <- c(covariates_dem,
                    "high_temperature",
                    "high_usage",
                    "high_general_risk",
                    "high_weather_risk",
                    "high_accuracy",
                    "high_credibility")

covariates_list <- list(NULL, covariates_dem, covariates_all)

DVs <- c("a", "b")
IVs <- c("female", "high_accuracy", "high_credibility", "high_usage", "high_temperature", "high_education")
# Define a function to create the list of models
create_models <- function(responses, covariates_list, IV = "female") {
  models <- list()
  
  for (response in responses) {
    for (covariates in covariates_list) {
      # Use glue to correctly format the formula components
      formula_terms <- c(
        glue("{IV}_con"),
        glue("{IV}*treated"),
        glue("{IV}*surprise"),
        glue("{IV}")
      )
      
      # Base formula terms
      base_formula <- c("surprise*treated", formula_terms)
      
      # Append covariates if provided
      if (!is.null(covariates)) {
        formula_terms <- c(base_formula, covariates)
      } else {
        formula_terms <- base_formula
      }
      
      # Create the formula
      formula <- reformulate(formula_terms, response = response)
      
      # Define the variables you want to extract from the model
      vars <- c(glue("surpriseTRUE:treatedTRUE"), glue("{IV}_con"))
      
      # Add the formula and vars to the list of models
      models <- append(models, list(list(formula = formula, 
                                         vars = vars, 
                                         response = response, 
                                         covariates = covariates, 
                                         IV = IV)))
    }
  }
  
  return(models)
}
"treatment*treated+female_both+female_int+female*treated+female*treatment+female"
"surprise*treated+female_con+female*treated+female*surprise+female"
# Function to run regression and extract coefficients
run_regression <- function(formula,
                           subset, subset_name,
                           vars, covariates, IV, response) {
  
  # Run the linear model
  model <- lm(formula, data = subset)
  
  # Get the confidence intervals for the specified variables
  coefs <- coefci(model, 
                  parm = vars, 
                  vcov = vcovCL(model, cluster = subset$participant.label, type = "HC1"), 
                  level = 0.95)
  
  # Extract coefficient estimates
  estimates <- model$coefficients[vars]
  
  # Create the result table with additional columns for covariates, IV, and response
  result <- data.table(
    variable = vars,
    estimate = estimates,
    conf.low = coefs[, 1],
    conf.high = coefs[, 2],
    covariates = ifelse(test = is.null(covariates), yes = 0, no = length(covariates)),
    IV = IV,
    response = response,
    subset = subset_name
  )
  
  return(result)
}
models <- create_models(DVs, covariates_list, IV = "high_credibility")

# Run regressions and combine results
results <- rbindlist(lapply(models, 
                            function(model) {
                              run_regression(formula = as.formula(model$formula), 
                                             subset = data, 
                                             subset_name = "full",
                                             vars = model$vars,
                                             covariates = model$covariates,
                                             IV = model$IV,
                                             response = model$response)
                            }
                            )
                     )

results %>% kable()
variable estimate conf.low conf.high covariates IV response subset
surpriseTRUE:treatedTRUE -0.0388621 -0.0179214 0.1987169 0 high_credibility a full
high_credibility_con 0.0903977 -0.1161782 0.0384540 0 high_credibility a full
surpriseTRUE:treatedTRUE -0.0206460 -0.0316047 0.1898641 7 high_credibility a full
high_credibility_con 0.0791297 -0.0978072 0.0565152 7 high_credibility a full
surpriseTRUE:treatedTRUE -0.0206460 -0.0317072 0.1899666 13 high_credibility a full
high_credibility_con 0.0791297 -0.0978786 0.0565866 13 high_credibility a full
surpriseTRUE:treatedTRUE 0.0073134 -0.0368519 0.0559084 0 high_credibility b full
high_credibility_con 0.0095282 -0.0244345 0.0390613 0 high_credibility b full
surpriseTRUE:treatedTRUE 0.0070890 -0.0330513 0.0653574 7 high_credibility b full
high_credibility_con 0.0161530 -0.0271233 0.0413012 7 high_credibility b full
surpriseTRUE:treatedTRUE 0.0070890 -0.0330969 0.0654029 13 high_credibility b full
high_credibility_con 0.0161530 -0.0271550 0.0413329 13 high_credibility b full
heterogeneity_full_list <- list()

for(iv in IVs){
  models <- create_models(DVs, covariates_list, IV = iv)

  # Run regressions and combine results
  results <- rbindlist(lapply(models, 
                              function(model) {
                                run_regression(formula = as.formula(model$formula), 
                                               subset = data, 
                                               subset_name = "full",
                                               vars = model$vars,
                                               covariates = model$covariates,
                                               IV = model$IV,
                                               response = model$response)
                              }
                              )
                       )
  heterogeneity_full_list[[length(heterogeneity_full_list) + 1]] <- results
}

heterogeneity_full <- rbindlist(l = heterogeneity_full_list)
heterogeneity_full[, 
                   var_name := paste0(response, 
                                      " (", 
                                      ifelse(test = variable == "surpriseTRUE:treatedTRUE", 
                                             yes = paste0("not ", IV), 
                                             no = paste0(IV, " DDD")),
                                      ")")]

heterogeneity_full[covariates == 0,
                   controls := "1.) none"]
heterogeneity_full[covariates == 7,
                   controls := "2.) demographic"]
heterogeneity_full[covariates == 13,
                   controls := "3.) demographic + further"]
heterogeneity_full[, controls := factor(controls, 
                                        levels = c("3.) demographic + further",
                                                   "2.) demographic",
                                                   "1.) none"))]
# Order the y_label factor levels
# y_order <- c("a (not female)","a (female DDD)", "b (not female)", "b (female DDD)","a (less accurate)","a (accurate DDD)", "b (less accurate)", "b (accurate DDD)","a (less credible)","a (credible DDD)", "b (less credible)", "b (credible DDD)","a (less forecast usage)","a (forecast usage DDD)", "b (less forecast usage)", "b (forecast usage DDD)","a (lower temperature)","a (temperature DDD)", "b (lower temperature)", "b (temperature DDD)","a (lower education)","a (education DDD)", "b (lower education)", "b (education DDD)")
# 
# heterogeneity_full[, y_label := factor(y_order, levels = rev(y_order))]

# Create the plot
ggplot(data = heterogeneity_full, 
       mapping = aes(y = var_name, 
                     x = estimate, 
                     xmin = conf.low, 
                     xmax = conf.high, 
                     shape = controls)) +
  geom_pointrange(position = position_dodge(width = 0.4), fatten = 5, alpha = 0.8) +
  geom_vline(xintercept = 0, color = "red", alpha = 0.2) +
  labs(title = "(a) Effect of contradiction (relative to confirmation).",
       y = "Ambiguity Index", 
       x = "Estimate", 
       shape = "Control variables") +
  theme_bw() +
  theme(
    plot.title = element_text(face = "bold", size = 12),
    legend.background = element_rect(fill = "white", size = 4, colour = "white"),
    axis.ticks = element_line(colour = "grey90", size = 0.1),
    panel.grid.major = element_line(colour = "grey90", size = 0.07),
    panel.grid.minor = element_blank(),
    legend.key.size = unit(1, "lines"),
    legend.box = 'horizontal',
    legend.position = "top"
  )

Figure 6.1: Heterogeneous treatment effects of regression equation (1) with dependent variables b and a. Estimators with 95% confidence intervals. The underlying standard errors (“HC1”) are clustered at the individual level and estimated with the R package sandwich (Zeileis, 2004; Zeileis et al., 2020).

Session Info

sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] knitr_1.48        glue_1.7.0        sandwich_3.1-0    lmtest_0.9-40    
 [5] zoo_1.8-12        ggplot2_3.5.1     Rmisc_1.5.1       plyr_1.8.9       
 [9] lattice_0.22-6    stringr_1.5.1     data.table_1.15.4 magrittr_2.0.3   

loaded via a namespace (and not attached):
 [1] gtable_0.3.5      jsonlite_1.8.8    dplyr_1.1.4       compiler_4.4.1   
 [5] tidyselect_1.2.1  Rcpp_1.0.13       parallel_4.4.1    scales_1.3.0     
 [9] yaml_2.3.10       fastmap_1.2.0     R6_2.5.1          labeling_0.4.3   
[13] generics_0.1.3    htmlwidgets_1.6.4 tibble_3.2.1      munsell_0.5.1    
[17] pillar_1.9.0      rlang_1.1.4       utf8_1.2.4        stringi_1.8.4    
[21] xfun_0.46         cli_3.6.3         withr_3.0.1       digest_0.6.36    
[25] grid_4.4.1        rstudioapi_0.16.0 lifecycle_1.0.4   vctrs_0.6.5      
[29] evaluate_0.24.0   farver_2.1.2      groundhog_3.2.0   fansi_1.0.6      
[33] colorspace_2.1-1  rmarkdown_2.27    tools_4.4.1       pkgconfig_2.0.3  
[37] htmltools_0.5.8.1